A meta-heuristic density-based subspace clustering algorithm for high-dimensional data

نویسندگان

چکیده

Subspace clustering is one of the efficient techniques for determining clusters in different subsets dimensions. Ideally, these should find all possible non-redundant which data point participates. Unfortunately, existing hard subspace algorithms fail to satisfy this property. Additionally, with increase dimensions data, classical become inefficient. This work presents a new density-based algorithm (S_FAD) overcome drawbacks algorithms. S_FAD based on bottom-up approach and finds varied density using parameters DBSCAN algorithm. The optimizes DBCAN through hybrid meta-heuristic uses hashing concepts discover clusters. efficacy evaluated against various artificial real datasets terms F_Score rand_index. Performance assessed three parameters: average ranking, SRR scalability Statistical analysis performed Wilcoxon signed-rank test. Results reveal that performs considerably better majority scales well up 6400 actual dataset.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering for High Dimensional Data: Density based Subspace Clustering Algorithms

Finding clusters in high dimensional data is a challenging task as the high dimensional data comprises hundreds of attributes. Subspace clustering is an evolving methodology which, instead of finding clusters in the entire feature space, it aims at finding clusters in various overlapping or non-overlapping subspaces of the high dimensional dataset. Density based subspace clustering algorithms t...

متن کامل

A Fuzzy Subspace Algorithm for Clustering High Dimensional Data

In fuzzy clustering algorithms each object has a fuzzy membership associated with each cluster indicating the degree of association of the object to the cluster. Here we present a fuzzy subspace clustering algorithm, FSC, in which each dimension has a weight associated with each cluster indicating the degree of importance of the dimension to the cluster. Using fuzzy techniques for subspace clus...

متن کامل

Density-Connected Subspace Clustering for High-Dimensional Data

Several application domains such as molecular biology and geography produce a tremendous amount of data which can no longer be managed without the help of efficient and effective data mining methods. One of the primary data mining tasks is clustering. However, traditional clustering algorithms often fail to detect meaningful clusters because most real-world data sets are characterized by a high...

متن کامل

Subspace Clustering for High Dimensional Categorical Data

A fundamental operation in data mining is to partition a given dataset into clusters such that objects in the same cluster are more similar to each other than objects in different clusters according to some defined criteria [2]. These criteria are usually defined in the form of some distance, and similarity is hence defined as follows, the smaller the distance is, the more similar the objects a...

متن کامل

Soft Subspace Clustering for High-Dimensional Data

High dimensional data is a phenomenon in real-world data mining applications. Text data is a typical example. In text mining, a text document is viewed as a vector of terms whose dimension is equal to the total number of unique terms in a data set, which is usually in thousands. High dimensional data occurs in business as well. In retails, for example, to effectively manage supplier relationshi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Soft Computing

سال: 2021

ISSN: ['1433-7479', '1432-7643']

DOI: https://doi.org/10.1007/s00500-021-05973-1